Psy 633
Hypothesis Testing, Effect Size, & Power
I. Logic of Hypothesis Testing
A. Hypothesis Testing
- an inferential procedure that uses sample data to evaluate a hypothesis about a population
- general scheme
1. State hypothesis about population
2. obtain random sample
3.
compare
(M - m)
or (X -m)
if n = 1
- assumption: if treatment has effect it adds a constant to each score
First suppose that n=1
B. Procedure
1. State Hypothesis
Ho = null hypothesis, treatment has no effect
H1 = treatment has effect (alternative or experimental hypothesis)
2. Set criteria for decision
- there is always some discrepancy between sample stats and pop. parameters
- sampling error
--t distribution
- generally not normal - flattened and stretched out
- approximates normal in the way that t approximates z
- shape determined by df
Degrees of Freedom
- df = n - 1
- the greater n is, the more closely s represents s, and then the better t represents z
3. Collect sample data
calculate test statistic--Single Sample t-test
Formula:
t = M-µ/ SM
4. Evaluate Null hypothesis
Reject Ho
Retain Ho (Fail to reject Ho)
II. Evaluating Hypotheses
A. Alpha Level (a)
- minimize risk of type I error
1. determine what data are expected if Ho true
2. determine what data are unlikely if Ho true
3. use distribution of sample means separated into two parts
- Xbar or M expected (hi prob) if Ho true
- Xbar or M unlikely (low prob) if Ho true
4. The alpha level defines very unlikely (e.g., extreme 5% of distribution) scores to obtain by chance
- Xbar or M compatible with middle of distribution
- Xbar or M compatible with extremes of distribution
5. When Ho falls into tails, we reject Ho
- very unlikely sample if the treatment had no effect
B. Assumptions for Parametric Tests
1.normality--DV must be normally distributed
2. independent observations
3. homogeneity of variance, s not changed by treatment
4. interval or ratio scale for the DV
C. One-tailed test--Critical region in only one tail
- reject Ho with smaller difference between M and m
- more "sensitive"
- increase the possibility of Type I error (false alarm)
III. Errors in Hypothesis Testing
A. Type I error - reject Ho when true
B. Type II error - fail to reject Ho when false
C. Power
- the probability of detecting a treatment effect when one is indeed present.
-
power is the opposite of Type II error (when a treatment effect really exists in
the population).
-power
= 1 – (type II error) or 1 – (beta)
-as
type II error decreases, power increases
- by decreasing type I error (move from .05 to .01) we directly increase type II error (and thereby decrease power).
The Relationship between Power and Sample Size
D. Effect Size
Important limitation of the hypothesis testing procedure:
It makes a relative comparison: the size of the treatment effect relative to the difference expected by chance. If the standard error is very small, then the treatment effect can also be very small and still be bigger than chance.
Therefore, a significant
effect does not necessarily mean a big effect.
Also,
if the sample size is large enough, any treatment effect, no matter how small,
can be enough for us to reject the null hypothesis.
Figure 8-11
(p. 262)
The
appearance of a 15-point treatment effect in two different situations. In part
(a), the standard deviation is σ = 100 and the 15-point
effect is relatively small. In part (b), the standard deviation is σ = 15
and the 15-point effect is relatively large. Cohen’s d
uses the standard deviation to help measure effect size.
Calculating
effect size:
0
< d < 0.2
Small effect
0.2
< d < 0.8
Medium effect
d
> 0.8
Large effect
Alternative effect size for
t-tests: r2 = t2
/ (t2 + df)
Advantage to this one is that people are familiar with it.
Values range from 0.00 to 1.00.
What proportion of the total variability in the scores is accounted for by the treatment?
.09 and below
Small effect
between .09 & .25
Medium effect
over .25
Large effect
The Relationship between Power and Effect Size
Think about the following:
Suppose that a researcher normally uses an alpha level of .01 for hypothesis
tests, but this time uses an alpha level of .05.
a) What does this change in alpha level do to the amount of power?
b) What does this change in alpha do to the risk of a Type I error?